Prediction of protein subcellular multisite localization using a new feature extraction method.
نویسندگان
چکیده
A basic problem of proteomics is identifying the subcellular locations of a protein. One factor making the problem more complicated is that some proteins may simultaneously exist in two or more than two subcellular locations. To improve multisite prediction quality, it is necessary to use effective feature extraction methods. Here, we developed a new feature extraction method based on the pK value and frequencies of amino acids to represent a protein as a real values vector. Using this novel feature extraction method, the multi-label k-nearest neighbors (ML-KNN) algorithm and setting different weights into different attributes' ML-KNN, known as wML-KNN, were employed to predict multiplex protein subcellular locations. The best overall accuracy rate on dataset S1 from the predictor of Virus-mPLoc was 59.92 and 86.04% on dataset S2 from Gpos-mPLoc, respectively.
منابع مشابه
Multisite protein subcellular localization prediction based on entropy density.
Protein subcellular localization prediction is currently receiving much attention in the field of protein research. Many researchers make great efforts to study single-site protein subcellular localization, but the experimental data shows that many proteins can be found in two or more sub-cellular locations, prompting the study of multisite protein sub-cellular localization. This study utilized...
متن کاملFeature Weighting-based Classifier for Protein Subcellular Localization
Protein subcellular localization prediction plays an important role for understanding the functions and biological processes that proteins are involved in. By using protein sequence information, we can predict where a protein belongs to. In this paper, we propose a new linear classifier for predicting subcellular localizations of proteins using improved features extracted from protein sequences...
متن کاملPrediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine
Predicting the destination of a protein in a cell is important for annotating the function of the protein. Recent advances have allowed us to develop more accurate methods for predicting the subcellular localization of proteins. One of the most important factors for improving the accuracy of these methods is related to the introduction of new useful features for protein sequences. In this paper...
متن کاملSequence-driven features for prediction of subcellular localization of proteins
Prediction of the cellular location of a protein plays an important role in inferring the function of the protein. Feature extraction is a critical part in prediction systems, requiring raw sequence data to be transformed into appropriate numerical feature vectors while minimizing information loss. In this paper we present a method for extracting useful features from protein sequence data. The ...
متن کاملPrediction of Protein Subcellular Multi-locations with a Min-Max Modular Support Vector Machine
How to predict subcellular multi-locations of proteins with machine learning techniques is a challenging problem in computational biology community. Regarding the protein multi-location problem as a multi-label pattern classification problem, we propose a new predicting method for dealing with the protein subcellular localization problem in this paper. Two key points of the proposed method are ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genetics and molecular research : GMR
دوره 15 3 شماره
صفحات -
تاریخ انتشار 2016